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We consider the problem of maximizing a nonnegative submodular set function 
/ : 2^ —> M + subject to a p-matchoid constraint in the single-pass streaming 
setting. Previous work in this context has considered streaming algorithms for 
modular functions and monotone submodular functions. The main result is 
for submodular functions that are non-monotone. We describe deterministic 
and randomized algorithms that obtain a P(^-approximation using O(klogk)- 
space, where k is an upper bound on the cardinality of the desired set. The 
model assumes value oracle access to / and membership oracles for the matroids 
defining the p-matchoid constraint. 


1. Introduction 

Let / : 2^ —> M be a set function defined over a ground set A/". / is submodular if it exhibits 
decreasing marginal values in the following sense: if e £ J\T is any element, and A,BCj\f 
with AC B are any two nested sets, then f(A + e) — f(A) > f(B + e) — f(B). The gap 
f(A + e) — f(A) is called the marginal value of e with respect to / and A, and denoted 
/a(s). An equivalent characterization for submodular functions is that for any two sets 
A,BCAT, f(A U B) + f(A n B) < f(A) + f(B). 

Submodular functions play a fundamental role in classical combinatorial optimization 
where rank functions of matroids, edge cuts, coverage, and others are instances of submodular 

‘Work on this paper supported in part by NSF grant CCF-1319376. 

^Work on this paper supported in part by NSF grant CCF-1319376. 

*Work on this paper supported in part by NSF grants CCF-1319376, CCF-1421231, and CCF-1217462. 


1 



functions (see [Sch03, Fuj05]). More recently, there is a large interest in constrained 
submodular function optimization driven both by theoretical progress and a variety of 
applications in computer science. The needs of the applications, and in particular the sheer 
bulk of large data sets, have brought into focus the development of fast algorithms for 
submodular optimization. Recent work on the theoretical side include the development of 
faster worst-case approximation algorithms in the traditional sequential model of computation 
[BV14, IJB13, CJV15], algorithms in the streaming model [BMKK14, CK14] as well as in 
the map-reduce model of computation [KMVV13]. 

In this paper we consider constrained submodular function maximization. The goal is 
to find max5 G x/(<S l ) where X C 2^ is a downward-closed family of sets; i.e., A E X and 
BC4 implies B G X. X is also called an independence family and any set A £ X is called an 
independent set. Submodular maximization under various independence constraints has been 
extensively studied in the literature. The problem can be easily seen to be NP-hard even for 
a simple cardinality constraint as it encompasses standard NP-hard problems like the Ma x-k- 
cover problem. Constrained submodular maximization has found several new applications in 
recent years. Some of these include data summarization [LB11, SSSJ12, DKR13], influence 
maximization in social networks [KKT03, CWY09, CWW10, GBL11, SS13], generalized 
assignment [CCPV07], mechanism design [BIK07], and network monitoring [LKG + 07]. 

In some of these applications, the amount of data involved is much larger than the main 
memory capacity of individual computers. This motivates the design of space-efficient 
algorithms which can process the data in streaming fashion, where only a small fraction of 
the data is kept in memory at any point. There has been some recent work on submodular 
function maximization in the streaming model, focused on monotone functions (i.e. f(A) < 
f(B), whenever A C B). This assumption is restrictive from both a theoretical and practical 
point of view. 

In this paper we present streaming algorithms 
for non-monotone submodular function maxi¬ 
mization subject to various combinatorial con¬ 
straints, the most general being a p-matchoid. 
p-matchoid’s generalize many basic combinato¬ 
rial constraints such as the cardinality constraint, 
the intersection of p matroids, and matchings in 
graphs and hyper-graphs. A formal definition of 
a p-matchoid is given in Section 2. We consider 
the abstract p-matchoid constraint for theoretical 
reasons, and most constraints in practice should 
be simpler. We explicitly consider the cardinality 
constraint and obtain an improved bound. 

We now describe the problem formally. We 
are presented a groundset of elements A f = 

{ei, e 2 , • • ■ e n }, with no assumption made on the 
order or the size of the datastream. The goal is to select an independent set S C A/" (where 
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independence is defined by the p-matchoid) , which maximizes a nonnegative submodular 
function / while using as little space as possible. We make the following assumptions: (i) the 
function / is available via a value oracle, that takes as input a set S C AT and returns the 
value /(S'); (ii) the independence family X is available via a membership oracle with some 
additional information needed in the p-matchoid setting; and (iii) the constraints specify 
explicitly, and a priori, an upper bound k on the number of elements to be chosen. We 
discuss these in turn. The availability of a value oracle for / is a reasonable and standard 
assumption in the sequential model of computation, but needs some justification in restrictive 
models of computation such as streaming where the goal is to store at any point of time 
only a small subset of the elements of A f. Can f(S ) be evaluated without having access to 
all of A/"? This of course depends on /. [BMKK14] gives several examples of interesting and 
useful functions where this is indeed possible. The second assumption is also reasonable 
if, as we remarked, the p-matchoid constraint is in practice going to be a simple one that 
combines basic matroids such as cardinality, partition and laminar matroid constraints that 
can be specified compactly and implicitly. Finally, the third assumption is guided by the fact 
that an abstract model of constraints can in principle lead to every element being chosen. 
In many applications the goal is to select a small and important subset of elements from 
a much larger set; and it is therefore reasonable to expect knowledge of an upper bound 
on how many can be chosen. Submodular set functions are ubiquitous and arise explicitly 
and implicitly in a variety of settings. The model we consider in this paper may not be 
useful directly in some important scenarios of interest. Nevertheless, the ideas underlying 
the analysis in the streaming model that we consider here may still be useful in speeding up 
existing algorithms and/or reduce their space usage. 

As is typical for streaming algorithms, we measure performance in four basic dimensions: 
(i) the approximation ratio /(S')/OPT, where S is the output of the algorithm and OPT 
is the value of an optimal solution; (ii) the space usage of the algorithm; (iii) the update 
time or the time required to process each stream element; and (iv) the number of passes the 
algorithm makes over the data stream. 

Our results. We develop randomized and deterministic algorithms that yield an 0(1 /p)- 
approximation for maximizing a non-negative submodular function under a p-matchoid 
constraint in the one-pass streaming setting. The space usage is O(klogk), essentially 
matching recent algorithms for the simpler setting of maximizing a monotone submodular 
function subject to a cardinality constraint [BMKK14], The randomized algorithm achieves 
better constants than the deterministic algorithm. As far as we are aware, we present 
the first streaming algorithms for non-monotone submodular function maximization under 
constraints beyond cardinality. We give an improved bound of for the cardinality 
constraint. For the monotone case our bounds match those of Chakrabarti and Kale [CK14] 
for a single pass; we give a self-contained algorithm and analysis. Table 1 summarizes our 
results for a variety of constraints. 
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1-1/e [NWF78] 

1/e + .004 [BFNS14] 

kp[BMKK14] 

fS (R.*) 

matroid 

1 - 1/e (R) 
[CCPV11] 

Ifi (R) [FNS11] 

1/4 [CK14] 
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(1 -^; (1)) (r.*) 


Table 1: Best known approximation bounds for submodular maximization. Bounds for randomized 
algorithms that hold in expectation are marked (R). For hypergraph 5-matchings and matroid 
intersection, p is fixed. In the results for p-matchoids, o(l) goes to zero as p increases. New bounds 
attained in this paper are marked (*). All new bounds except for the cardinality constraint are the 
first bounds for their class. The best previous bound for the cardinality constraint is about .0893, 
by [BFS15], 

A brief overview of techniques. Streaming algorithms for constrained modular and sub- 
modular function optimization are usually clever variations of the greedy algorithm, which 
picks elements in iterations to maximize the gain in each iteration locally while maintain¬ 
ing feasibility. For monotone functions, in the offline setting, greedy gives a l/(p+ 1)- 
approximation for the p-matchoid constraint and a (1— l/e)-approximation for the cardinality 
constraint [FNW78]. The offline greedy algorithm cannot be directly implemented in streams, 
but we outline two different strategies that are still greedy in spirit. For the cardinality 
constraint, Badanidiyuru et al. [BMKK14] designed an algorithm that adds an element to 
its running solution S only if the marginal gain is at least a threshold of about OPT/2 k. 
Although the quantity OPT/2A; is not known a priori, they show that it lies in a small 
and identifiable range, and can be approximated with 0(\ogk) well-spaced guesses. The 
algorithm then maintains 0(\ogk) solutions in parallel, one for each guess. Another strategy 
from Chakrabarti and Kale [CK14], based on previous work for matchings [FKM + 05, McG05] 
and matroid constraints [Badll] with modular weights, will consider deleting elements from 
S when adding a new element to S is infeasible. More specifically, when a new element e 
is encountered, the algorithm finds a subset CCS such that (S \ C) + e is feasible, and 
compare the gain /((S \ C) + e) — /(S) to a quantity representing the value that C adds to 
S. In the modular case, this may be the sum of weights of elements in C; for monotone 
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submodular functions, Chakrabarti and Kale used marginal values, fixed for each element 
when the element is added to S, as proxy weights instead. 

The non-monotone case is harder because marginal values can be negative even when 
/ is non-negative. The natural greedy algorithm fails for even the simple cardinality 
constraint, and the best offline algorithms for nonnegative submodular maximization are 
uniformly weaker (see Table 1). To this end, we adapt techniques from the recent work of 
Buchbinder et al. [BFNS14] in our randomized algorithm, and techniques from Gupta et al. 
[GRST10] for the deterministic version. Buchbinder et al. randomized the standard greedy 
algorithm (for cardinality) by repeatedly gathering the top (say) k remaining elements, and 
then randomly picking only one of them. We adapt this to the greedy setting by adding 
the top elements to a buffer B as they appear in the stream, and randomly adding an 
element from B to S only when B fills up. What remains of B at the end of the stream is 
post-processed by an offline algorithm. Gupta et al. gave a framework for adapting any 
monotone submodular maximization algorithm to nonnegative submodular functions, by first 
running the algorithm once to generate one independent set Si, then running the algorithm 
again on the complement of Sj to generate a second set S 2 , and running an unconstrained 
maximization algorithm on Si to produce a third set S 3 , finally returning the best of Si, S 2 , 
and S 3 . Our deterministic streaming algorithm is a natural adaptation, piping the rejected 
elements of one instance of a streaming algorithm directly into a second instance of the 
same algorithm, and post-processing all the elements taken by the first streaming instance. 
Both of our algorithms require that we limit the number of elements ever added to S, which 
then limits the size of the input for the post-processor. This limit is enforced by the idea of 
additive thresholds from [BMKK14] and a simple but subtle notion of value that ensures 
the properties we desire. 

Related work. There is substantial literature on constrained submodular function opti¬ 
mization, and we only give a quick overview. Many of the basic problems are NP-Hard, so 
we will mainly focus on the development of approximation algorithms. The (offline) problem 
maxsgx f(S) for various constraints has been extensively explored starting with the early 
work of Fisher, Nemhauser, Wolsey on greedy and local search algorithms [NWF78, FNW78]. 
Recent work has obtained many new and powerful results based on a variety of methods 
including variants of greedy [GRST10, BFNS14, BFNS12], local search [LMNS10, LSV10, 
FW14], and the multilinear relaxation [CCPV11, KST13, BKNS12, CVZ11], Monotone 
submodular functions admit better bounds than non-monotone functions (see Table 1). For 
a p-matchoid constraint, which is our primary consideration, an S7(l/^-approximation can 
be obtained for non-negative functions. Recent work has also obtained new lower bounds 
on the approximation ratio achievable in the oracle model via the so-called symmetery gap 
technique [Vonl3]; this also yields lower bounds in the standard computational models 
[DV12], 

Streaming algorithms for submodular functions are a very recent phenomenon with algo¬ 
rithms developed recently for monotone submodular functions [BMKK14, CK14]. [BMKK14] 
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gives a 1/2 — e approximation for monotone functions under cardinality constraint using 
0(k\ogk/e) space. [CK14] focuses on more general constraints like interesctions of p- 
matroids and rank p hypergraphs, giving an approximation of 1/4 p using a single pass. 
Their algorithm extends to multiple passes, with an approximation bound of l/(p + 1 + e) 
with 0(e ~ 3 logp) passes. The main focus of [KMVV13] is on the map-reduce model although 
they claim some streaming results as well. 

Related to the streaming models are two online models where elements arrive in an online 
fashion and the algorithm is required to maintain a feasible solution S at all times; each 
element on arrival has to be processed and any element which is discarded from S at any 
time cannot be added back later. Strong lower bounds can be shown in this model and two 
relaxations have been considered. In the secretary model , the elements arrive according to 
a random permutation of the ground set and an element added to S cannot be discarded 
later. In the secretary model, constant factor algorithms are known for the cardinality 
constraint and some special cases of a single matroid constraint [GRST10, BHZ13]. These 
algorithms assume the stream is randomly ordered and their performance degrades badly 
against adversarial streams; the best competitive ratio for a single general matroid is 0 (log k) 
(where k is the rank of the matroid). Recently, Buchbinder et al. [BFS15] considered a 
different relaxation of the online model where preemptions are allowed: elements added 
to S can be discarded later. Algorithms in the preemptive model are usually streaming 
algorithms, but the converse is not true (although the one-pass algorithms in [CK14] are 
preemptive). For instance, the algorithm in [BMKK14] maintains multiple feasible solutions 
and our algorithms maintain a buffer of elements neither accepted nor rejected. The space 
requirement of an algorithm in the online model is not necessarily constrained since in 
principle an algorithm is allowed to keep track of all the past elements seen so far. The 
main result in [BFS15], as it pertains to this work, is a randomized 0.0893-competitive 
algorithm for cardinality constraints using 0{k)- space. As Table 1 shows, we obtain a 
(1 — e )/(2 + e)-competitive algorithm for this case using 0(k log fc/e 2 )-space. 

Paper organization. Section 2 reviews combinatorial definitions and introduces the notion 
of incremental values. Section 3 analyzes an algorithm that works for monotone submodular 
functions, and Section 4 adapts this algorithm to the non-monotone case. In Section 5, we 
give a deterministic streaming algorithm with slightly weaker guarantees. 

2. Preliminaries 

Matroids. A matroid is a finite set system A4 = (AT,I), where Af is a set and X C 2 ^ 
is a family of subsets such that: (i) 0 £ X, (ii) If A C B C Af, and B £ X, then A £ X, 
(iii) If A, B £ X and |A| < |2?|, then there is an element b £ B \ A such that A + b £ X. 
In a matroid A4 = (AT,!), Af is called the ground set and the members of X are called 
independent sets of the matroid. The bases of A4 share a common cardinality, called the 
rank of A4. 
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Matchoids. Let Ml i = (Mi,T \),..., _Adq = (M q ,T q ) be q matroids over overlapping ground- 
sets. Let A f = J\Ti U • • • U Mq and I = {S' C M : S n Mi G M. for all £}■ The finite set system 
Ml p = (M,I) is a p-matchoid if for every element e G M, e is a member of Me for at most 
p indices l G [q] . p-matchoids generalizes matchings and intersections of matroids, among 
others (see Figure 1). 

Maximizing submodular functions under a p-matchoid constraint. Let M be a set of 

elements, / : 2M —> M>o a nonnegative submodular function on M, and Ml p = (M,T) a 
p-matchoid for some integer p. We want to approximate OPT = maxggj/(S'). There are 
several polynomial-time approximation algorithms that give an fl(l/p)-approximation for this 
problem, with better bounds for simpler constraints (see Table 1). These algorithms are used 
as a black box called Offline, with approximation ratio denoted by 7 p : if Offline returns 
S G X, then E[/(S)] > 7 p OPT (possibly without expectation, if Offline is deterministic). 

Incremental Value. Let M be a ground set, and let / : 2^ —> M be a submodular function. 
For a set S C M and an element e G S, what is the value that e adds to S? One idea is 
to take the margin fg_ e (e) = f(S ) — f(S — e ) of adding e to S — e. However, because / is 
not necessarily modular, we can only say that YleeS fs-e( e ) ^ f(S) without equality. It is 
natural to ask for a different notion of value where the values of the parts sum to the value 
of the whole. 

Let M be an ordered set and / : 2^ —> K be a set function. For a set S C M and element 
e G M, the incremental value of e in S, denoted i '(/, S,e), is defined as 

i /(/, S, e) = /s'(e), where S' = {s G S : s < e}. 

The key point of incremental values is that they capture the entire value of a set. The 
following holds for any set function. 

Lemma 1. Let M be an ordered set, f : 2M —y M a set function, and S C M a set. Then 
f( s ) = E ee 5^(/>‘ S ’ e )- 

Proof. Enumerate S = {ei,..., ei} in order, and let Si = {ei,..., e*} denote the first i 
elements in S. We have, 


v(f,S,ei) = Y fSi-A e i ) = m. 

eiS S a&S 


□ 

When / is submodular, we have decreasing incremental values analogous (and closely 
related) to decreasing marginal returns of submodular function. 

Lemma 2. Let S C T C M be two nested subsets of an ordered set M, let f : 2 ^ -> M be 
submodular, and let e G M. Then v(f, T, e) < u(f, S, e ). 
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Proof. Let S' = {s G S : s < e} and T' = {t G T : t < e}. Since SCT, clearly S' C T'. We 
have, 

v(f,T,e) = f T > (e) < fs>(e) = v{f, S,e) 

where the inequality follows by submodularity. □ 

The following is also an easy consequence of submodularity. 

Lemma 3. Let Af be an ordered set of elements, let f : 2^ —> M be a submodular function, 
S, Z C Af two sets, and e G S. Then v(fz, S, e) < v(f, Z U S,e). 

Proof. Let Z' = {z G Z : z < e} and S' = {s G S : s < e}. By submodularity, we have, 
v(fz,S,e) = fzuS'(e) < fz'us'(e) = v{f,ZUS,e). 

□ 


3. Streaming Greedy 

Let. Ai p = (A f, I) be a p-matchoid and / a submodular function. The elements of W are 
presented in a stream, and we order W by order of appearance. We assume value oracle 
access to /, that given S C AT, returns the value f(S). We also assume membership oracles 
for each of the q matroids defining A4 p : given S C Aft,, there is an oracle for Ait that returns 
whether or not S 6 Tf 

We first present a deterministic streaming algorithm Streaming-Greedy that yields an 
0 (l/p)-approximation for monotone submodular functions, but performs poorly for non¬ 
monotone functions. The primary motivation in presenting Streaming-Greedy is as a 
building block for a randomized algorithm Randomized-Streaming-Greedy presented in 
Section 4, and a deterministic algorithm Iterated-Streaming-Greedy presented in Section 
5. The analysis for these algorithms relies crucially on properties of Streaming-Greedy. 

Streaming-Greedy maintains an independent set S' G X; as an element arrives in the 
stream, it is either discarded or added to S in exchange for a well-chosen subset of S. The 
threshold for exchanging is tuned by two nonnegative parameters a. and (3. At the end of 
the stream, Streaming-Greedy outputs S. 

The overall strategy is similar to previous algorithms developed for matchings [FKM+05, 
McG05] and intersections of matroids [Badll] when / is modular, and generalized by [CK14] 
to monotone submodular functions. There are two main differences. One is the use of the 
additive threshold a. The second is the use of the incremental value v. By using incremental 
value, the value of an element e G S is not fixed statically when e is first added to S, and 
increases over time as other elements are dropped from S. These two seemingly minor 
modifications are crucial to the eventual algorithms for non-monotone functions. 

We remark that Streaming-Greedy also fits the online preemptive model. 
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Streaming-Greedy (a, j3) 

0 

while (stream is not empty) 

e <r- next element in the stream 
C <r- Exchange-Candidates (S,e) 

// C satisfies S — C + eGl 
if /s(e) > a + (l + P)J2c€C u (f’ S ’ c ) 
S^S\C + e 
end while 
return S 


Exchange-Candidates(5,e) 

C^- 0 

for i = 1,..., q 

if e G Me and (S + e) D Me ^ T-e 
Se = S C\Me 

A 4— {5 G S* : (Sf — s -(- e) G 
// X + e is a circuit 
ce <- arg min xeX v(f, S, x ) 
C<-C + ct 

end if 
end for 
return C 


Outline of the analysis: Let TElbe some fixed feasible set (we can think of T as an 
optimum set). In the offline analysis of the standard greedy algorithm one can show that 
f(S U T) < (p+ l)f(S), where S is the output of greedy; for the monotone case this implies 
that f(S) > f(T)/(p + 1). The analysis here hinges on the fact that each element of T \ S' 
is available to greedy when it chooses each element. In the streaming setting, this is no 
longer feasible and hence the need to remove elements in favor of new high-value elements. 
To relate S, the final output, to T, we consider U, the set of all elements ever added to S. 
The analysis proceeds in two steps. 

First, we upper bound f(U) by f(S) as f(U) < (1 + ^) ■ f(S) — jj\U\. Second, we 

upper bound f(T U U) as f(T U U) < ka + • p ■ f(S). For a = 0, we obtain 

f(T U U) < ^ 1+ P • p • /(S), which yields f(T) < 4 pf(S) when / is monotone (for f3 = 1); 
this gives the same bound as [CK14]. The crucial difference is that we are able to prove 
an upper bound on the size of U, namely, \U\ < OPT/ct; hence, if we choose the threshold 
a to be cOPT/fc for some parameter c we have \U\ < k/c. This will play a critical role in 
analyzing the non-monotone case in the subsequent sections that use Streaming-Greedy as 
a black box. The upper bound on \U\ is achieved by the definition of v and the threshold 
a; we stress that this is not as obvious as it may seem because the function / can be 
non-monotone and the marginal values can be negative. 

Some notation for the analysis: 

• S denotes the final set returned by Streaming-Greedy. 

• For each element e G AT, S~ denotes the set held by S just before e is processed, 
and S+ the set held by S just after e is processed. Note that if e is rejected, then 
S~ = S+. 

• U denotes the set of all elements added to S at any point in the stream. Note that 

U = U eeAfSt- 


9 






• For e E A f, C e = Exchange-Candidates (S~ , e) C S~ denotes the set of elements 
that Streaming-Greedy considers exchanging for e. Observe that {C u ,u E U} forms 
a partition of U \ S. 

• For e E AT, 5 e = f(Sf~) — f(S~) denotes the qain from processing e. Note that S P = 0 
for all e E M \ U, and £ egAr 4 = f(S). 


3.1. Relating f(U) to f(S) 

When Streaming-Greedy adds an element e to S, it only compares the marginal /5(e) to 
the incremental values in its exchange candidates C, and does not directly evaluate the gain 
f(S\C + e) — f(S ) realized by the exchange. The first lemma derives a lower bound for 
this gain. 

Lemma 4. Let e E U be added to S when processed by Streaming-Greedy. Then 

5 e >a + /3^2 v(f,S~,c) 
ceCe 

Proof. Since e replaced C e , by design of Streaming-Greedy, we have, 

f s - (e) > a + (1 + p) "{f, S~,c ), 

ceC c 

which, after rearranging, gives 

f S ~( e ) “ Y S~,c)>a + /3Y^ S~,c). 

CE.Ce CE.Ce 

To prove the lemma, it suffices to show that 

f>e > fs~ ( e ) - Y U (f’ S f’ C )- 

ceCe 

Let Z = S~ \ C e = Sf~ — e. Note that S+ = Z + e and S~ = Z U C e . We have, 


5 e = f(Z + e)~ f(Z + C e ) 

= fz(e) - fz(C e ) 

>f s -(e)-fz(Ce) 

= f s ~( e ) “ Y u Uz,C e ,c), 

ceCe 

> f S - ( e ) - Y U (f’ S e’ C ) 

CEC e 

as desired. 


by adding and subtracting f(Z), 
by submodularity of /, 

by definition of n. 
by Lemma 3 and S~ = ZUC e , 


□ 


10 


One basic consequence of Lemma 4 is that every element in U adds a positive and 
significant amount a to the value to S. This will be crucial later, when taking a proportional 
to OPT limits the size of U. 

Lemma 5. For all e G U , 6 e > a and hence \U\ < OPT /a. 

Proof. We claim that at any point in the algorithm, u(f, S, e) > 0 for all e G S, from which 
the lemma follows Lemma 4 immediately. 

When an element e G U is added to S, it has incremental value 

”{f,S£,e) = f S +-e( e ) ^ /s e -( e ) > a + (1 + P) E v(f,S,c). 

ceCe 

As the algorithm continues, elements preceding e in S may be deleted while elements after 
e are added, so z/(/, S, e ) can only increase with time. □ 

Returning to the original task of bounding f(U ), the difference U \ S is the set of deleted 
elements, and the only handle on these elements is their incremental value at the point of 
deletion. For d E U \ S, let e(d) be the element that d was exchanged for; that is, e(d) > d 
and d G S JW \ = C e ^)- F 01 deleted elements d G U \ S, the exit value x(d) of d is the 

incremental value of d evaluated when d is removed from S, defined formally as 

x{d) = v{f, S~( d y d) • 

Here we bound the sum of exit values of U \ S. 

Lemma 6. 

E x(d)<^-{f(S)-a\U\). 

deu\s 


Proof. Indeed, 

E = E E 

d£U\S U ^ U d £Cu. 


ueu ^ 



since {C u : u G U} partitions U\S, 


by Lemma 4, 


□ 


Now we bound f(U). 
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Lemma 7. 


f(U)< (l + i) -/(S)-||Cr|. 

Proof. Recall, for each element d E U \ S , e(d ) denotes the element added in exchange of d. 
We have, 

f(U) - f(S) = fg{U ) = ^2 u(fg, U, d) by Lemma 1 , 

d£U\S 

< ^2 x(d) by Lemma 2, 

d£U\S 

< ^ ■ f{S) - °^\U\ by Lemma 6 . 

□ 

Remark 8. The preceding lemmas relating /([/) and f(S) do not rely on the structure of 

M? = (A f,l). 


3.2. Upper bounding f(U U T) 

Let TgI be any feasible solution. The goal is to upper bound f(U U T). Here we use the 
fact that X is a p-matchoid to frame an exchange argument between T and U. 

Lemma 9. Let T € X be a feasible solution disjoint from U. There exists a mapping 
tp : T —> 2 U such that 

(a) Every s £ S appears in the set p>(t) for at most p choices of t E T. 

(b) Every d E U \ S appears in the set tp(t) for at most (p — 1) choices of t G X. 

(c) For each t G X, 


2 >(/A,c)< ^2 x(d)+ w 

c sCi deip(t)\S s£p(t)r\S 

Proof. The high level strategy is as follows. For each matroid Aii = in the p- 

matchoid Ai p , we construct a directed acyclic graph Qi on A/^, where a subset of T forms the 
source vertices and arrows preserve inequality (1). Applying Lemma 30 we get an injection 
from a subset of T into U flA fe- With care, the union of these injections will produce the 
mapping we seek. 

Let us review and annotate the subroutine Exchange-Candidates (S,e) . For each matroid 
= (Af e ,It) i n which S spans e (i.e., (S+ e) D Mg. ^ X^), we assemble a subset X e i C SnAfp 
that spans e in Ad^. Of these, we choose the element c e / E X e ( with the smallest incremental 
value with respect to S. 
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Fix a matroid Mg = {Mg,Tg). Let 

Tg = {t E T n A/f : (S^ + t) n Mg (ji Tg } 

be the set of elements in T obstructed by Mg, so to speak. For each x E Xg t g, add a directed 
edge (t,x) from t to x. Observe that for all t E Tg, Ng (t) = X t y spans t. 

Let 


Dg = {d : d = c e ,g for some e E U} 


be the elements deleted specifically for Mg. Observe that Dg C (U \ S ) CiMg. For d E Dg, 
the set Yfig = X e uyg — d + e(d) spans d, and for all y E Yd,g, 


X (d) = ^(/,5 e(d) ,d) < v (/,5 e(d) ,y) 


< 


x(y) 

s,y) 


if y E Z7 \ S, 
if V E 5. 


For each y E Y^, add the directed edge (d, y) from d to y. Observe that for all d E Dg, 
Ng t (d) = Y^ spans d. 

Clearly, ^ is a directed acyclic graph. The elements of Tg are sources in Qg, and the 
elements of Dg are never sinks. By Lemma 30, there exists an injection ipg from Tg to 
(U flA/f) \ Dg such that for each t E Tg, there is a path in Qg from t to <pg(t). If we write out 
the path t —» x\ x z —>• <fig{t), we have x\,..., x z E U \ S and 


v{f, S t > <v) < v {f, S t > ^l) < X(zi) < ■ ■ ■ < x(^) < 


x(<M0) 

S,Vg{t)) 


if eU\S, 
if <^(i) E S. 


After constructing <p^ for each matroid Mg, dehne tp : T —> 2 U by 


¥>(*) = U ¥>*(*)• 

e-.t£T e 

For each t E T, we have 

J>(/,Sr,c) < X(rf)+ ^ ^(/,5,s). 

cSCt d£ip(t)\S s£Sr\ifi(t) 

Each u E U belongs to at most p matroids, so each u E 17 appears in <p(f) for at most p 
values of t E T. Since {D^} covers U\S, and avoids each d £ U \ S appears in p(t) 
at most p — 1 times. □ 


Remark 10. A similar exchange lemma is given by Badanidiyuru for the intersection of p 
matroids with modular weights [Badll], and used implicitly by Chakrabarti and Kale in 
their extension to submodular weights. Here we extend the argument to p-matchoids and 
frame it in terms of incremental values. 
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Now we bound f(T U U). 

Lemma 11. Let T 6 l be an independent set. Then 

/(Tui/)<fc«+ (i y ) 2 - P -/(5). 

Proof. Let T' = T \ U. By submodularity, we have, 

MT)<Y,fu(t)<Y,fsrW- 

teT' teT' 

Since each t E T' is rejected, and \T'\ < k, we have 

E fs~ (*) ^ (! + P) E E v (f> S P ’ c ) + ak - 

teT' teT 1 ceC t 

Apply Lemma 9 to generate a mapping tp : T' — > 2 U . We have, 

( 1+ P)Y E "(W* 0 ) 

ter' ceC t 

<(1 + 0)E E X(rf) + n(^f, S, sE by construction of </?, 

teT' \de<p(t)\s seSn<p{t.) ) 

< (1 + /5) • (p- 1 ) E x(rf) + (l + /9)-P v ^/, 5, by construction of <£>, 

de(7\S seS 

< • (p - 1) • f(S) +p-(l + /3)- f(S) by Lemma 6 , 

To bound f(U LIT), we have, 


f(UUT) = f„(T) + f(U) 

. /(1 + /3) 2 

ika+l^-jT-p 

<k a+ ^f- 


l+l 

P 


f(S) + f(U) 


P-fCs) 


as desired. 


by the above, 


by Lemma 7, 


□ 
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3.3. A bound for the monotone case 

If / is monotone, then f(T) < f(U U T) for any set T. If we take T to be the set T* 
achieving OPT, a = 0, and (5 = 1, we obtain the followings. 

Corollary 12. Let A4 P = (Af, X) be a p-m.atchoid of rank k, and let f : 2 ^ —> M>o 
be a nonnegative monotone submodular function. Given a stream over J\T, Streaming- 
Greedy (0,1) is an online algorithm that returns a set S E X such that 

f{S) > L - OPT = P ■ max{/(r) : T 6 1}. 

Remark 13. Although these bounds match those of Chakrabarti and Kale for the intersection 
of p matroids [CK14], Randomized-Greedy requires more calls to the submodular value 
oracle as the incremental value of an element in S updates over time. That said, the number 
of times a taken element e E U reevaluates its incremental value is proportional to the 
number of times an element in Sf is deleted, which is at most the rank of AA P and generally 
considered small compared to the size of the stream. Furthermore, by taking a proportional 
to OPT (a procedure for which is discussed in Section 4.6), we can limit the size of U and 
thereby the number of additional oracle calls generated by shifting incremental values. 

4. Randomized Streaming Greedy 

Randomized-Streaming-Greedy adapts Streaming-Greedy to nonnegative submodular func¬ 
tions by employing a randomized buffer B to limit the probability that any element is 
added to the running solution S. Like Streaming-Greedy, Randomized-Streaming-Greedy 
maintains the invariant S E X. However, when a “good” element would have been added to 
S by Streaming-Greedy, it is instead placed in B. Once the number of elements in B hits a 
limit K, we pick one element in B uniformly at random and add it to S just as Streaming- 
Greedy would. 

Modifying S may break the invariant that the buffer only contains good elements. Since 
/ is submodular, the incremental value n(f, S, e ) of each e E S may increase if a preceding 
element is deleted. Furthermore, the marginal value fs(b ) of each buffered element b E B 
may decrease as elements are added to S. Thus, after modifying S, we reevaluate each 
b € B and discard elements that are no longer good. 

Let B be the set of elements remaining in the buffer B when the stream ends. We process 
B with an offline algorithm to produce a second solution S', and finally return the set S 
which is the better of S and S'. 

Outline of the analysis. Let T E X be an arbitrary independent set. Let T’ = T \ B be 
the portion fully processed by the online portion and T" = T n B the remainder left over in 
the buffer and processed offline. 
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Randomized-Str earning-Greedy (a, (3) 

S 0, P^0 

while (stream is not empty) 

e -f- next element in the stream 
if Is-Good(S,e) then B B + e 
if \B\ = K then 

e uniformly random from B 
C 4— Exchange-Candidates(S,e) 
B^B-e, S^(S\C) + e 
for all e! £ B 

unless Is-Good(S, e') 

B <- B-e' 

end if 
end while 

S' <r- Offlined?) 

return argmax Z6{ss , } f(Z) 


Is-Good (S ,e) 

C 4— Exchange-Candidates(S,e) 
if fs(e) > a + (1 + /3) K/, S, e!) 

return TRUE 
else return FALSE 


In Section 4.1, we first show that the analysis for T' largely reduces to that of Section 3. 
In particular, this gives us a bound on f(U U T r ). In Section 4.2, we combine this with a 
bound on f(T"), guaranteed by the offline algorithm, to obtain an overall bound on f(U U T) 
by f(S). In Section 4.3, we finally bound f(T) with respect to f(U), leveraging the fact 
that the buffer limits the probability of elements being added to S. In Section 4.4, we tie 
together the analysis to bound f(T) by f(S) for fixed a and (3. 

The analysis reveals that the optimal choice for [3 is 1, and that a should be chosen in 
proportion to OPT/fc, where k is the rank of the Ai p . Since OPT is not known a priori, 
in Section 4.6, we leverage a technique by Badanidiyuru et al. [BMKK14] that efficiently 
guesses the a to within a constant factor of the target value. The final algorithm is then 
log A: copies of Randomized-Streaming-Greedy run in parallel, each instance corresponding 
to a “guess” for a. One of these guesses is approximately correct, and attains the bounded 
asserted in Theorem 14. 

Theorem 14. Let A4 P = be a p-matchoid of rank k, let f : 2^ —> M>o a nonnegative 

submodular function over A f, and let e > 0 be fixed. Suppose there exists an algorithm for the 
offline instance of the problem with approximation ratio 7 p . Then there exists a streaming 
algorithm using total space o(^ kl °f k ^j that, given a stream overN, returns a set Sal such 
that 


(1 - e)OPT < 



E 


f(S) 


Some notation for the analysis 

• Let S be the state of S at the end of the stream. 
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Randomized-Streaming-Greedy (a,/3) 

Let G be an instance of Streaming-Greedy (a,/3) 

Let Sq refer to the set S maintained by G. 

B 4— 0 // B is a buffer of size K > 0 
while (stream is not empty) 

e <r- next element in the stream 

if Is-GoodCSc, e) then B <— B + e // buffers the “good” elements 
else send e downstream to G // G will reject e 
if \B\ = K then 

e an element from B picked uniformly at random // e is “good” 
B «— B — e 

send e downstream to G // G will add e to Sq 
for all e! £ B such that (not Is-Good(S <3 , e) ) ) 

B <- B - e' 

send e' downstream to G II G will reject e' 
end for 
end if 
end while 
S' <r- Offline (B) 

Return argmax Ze{SGjS , } f(Z) 


Figure 2: Randomized-Streaming-Greedy rewritten with the deterministic portion 
reduced to Streaming-Greedy. 


• Let U be the set of all elements to pass through S during the stream. 

• Let B be the set held by B at the end of the stream. 

• Let S = argmax^ g |^ 5 ,| f(Z) be the set output by Randomized-Streaming-Greedy. 

S, U, and B are random sets depending on the random selection process from B. S’ is a 
random variable depending on B, S, and the offline algorithm’s own internal randomization. 

4.1. Reducing to Streaming-Greedy 

If we set the buffer limit K to 1, eliminating the role of the buffer B, then Randomized- 
Streaming-Greedy reduces to the deterministic Streaming-Greedy algorithm from Section 
3. In Figure 2, we refactor Randomized-Streaming-Greedy as a buffer placed upstream 
from a running instance of Streaming-Greedy. The buffer only Liters and reorders the 
stream, and the analysis of Streaming-Greedy holds with respect to this scrambled stream. 
More precisely, if r denotes the random bits dictating B, then for any fixed r, the analysis of 
Section 3 still applies. We recap the preceding analysis for Streaming-Greedy as it applies 
here. 
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Lemma 15. Let a,0 > 0 be fixed parameters. For any T £l, we have 

f(U U (T \ B)) < Q+JH- • p • f(S) + ka. 
Furthermore, \U\ < OPT/a. 


4.2. Upper bounding f(U UT) by /(S') 

To bound f(U U T), where T G X is any independent set, we split fjj(T ) into the portion 
fu(T \ B ) that escapes the buffer, and the remainder fu(T n B) captured by the buffer. We 
bound the former with Lemma 15 and the latter by guarantees for Offline to obtain the 
following. 


Lemma 16. For any T G I, we have 


E[f(U U T)] < ka + 


0 7 P 


m 


Proof. For ease of exposition, let r denote the random bits that dictate the random selections 
from B, and let us subscript variables by r to highlight their dependence. Let T'. = T \ B r 
and T" = T n B r . For any fixed r, we have, 


f{U r UT) = f(U r ) + fu r ( T ) < f (Ur) + f Ur (T’ r ) + f Ur (T") by submodularity, 

(I -U - 

<ka+ - — - p ■ f(S r ) + fu r ( T r) by Lemma 15, 

(1 + 6) 2 

< ka H- - -■ p ■ f(S r ) + f(T.'") by sub modularity, 

<ka+ . p . /(5 r ) + - E [f(S' r )}. 

P Ip 

Here, the expectation surrounding /(S') is generated by the offline algorithm Offline, 
which may be randomized (see, for example, Table 1). Taking expectations of both sides 
over r, we have 


(1 + 0) 2 

E [f(U U T)] < ka + y ■ p • E 


< ka + 


P 

(1 + P ) 2 
P 


/(S)l +-E[/(S')] 

J 7 p 

1 


■p-\ -I E 

7p 


f(S) 


as desired. 


Q 
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4.3. Upper bounding f(T) by f(U UT) 

The remaining challenge is to bound E[/(?7 LIT)] from below by some fraction of /(T). 
The following technical lemma, used similarly by Buchbinder et al., gives us a handle on 
E[/([/UT)]. 

Lemma 17 ([BFNS14]). Let f : 2^ —>• M>o be a nonnegative submodular function. Suppose 
R is a random set according to a distribution p on 2^ where no element e 6 A f is picked 
with probability more than p. Then E[f(R)\ > (1 — p)/(0). Moreover, for any set Y C A/", 
E[f{RUY)]>(l-p)f(Y). 

In this case, U is a random set, and we want to upper bound the probability of an element 
e £ A f appearing in JJ. Intuitively, taking I\ large limits the probability of an element 
being selected from the buffer, while by Lemma 15, taking a large decreases the number of 
elements in U. 

Lemma 18. For any element e G AT, 

/ i \ OPT /a 

P {e e U] < 1 - (l — -J 

Proof. An element is added to S (and therefore U) if and only if it is selected from B when 
\B\ reaches K. By Lemma 15, we select from B at most OPT /a times, and each selection 
is made uniformly and independently at random from K elements. □ 

With this, we apply Lemma 17 to give the following. 

Lemma 19. Let T G T be a fixed independent set. Then 

f 1 \ OPT/a 

E[f(UuT)]>{l--j f(T). 

Proof. By Lemma 18, for all e G A f, P[e G U] < ^1 — (1 — l/A') OPT//a ^ = p. The claim 
then follows Lemma 17. □ 


4.4. Overall Analysis 

Tying together Lemma 16 and Lemma 19, we have the following. 


Lemma 20. For any T G I, we have 

OPT 


1 - 


aK 


f(T) <ka + 


|£ 

P 7 P 


m 


19 






Proof. Composing Lemma 16 and Lemma 19, we have, 


1 - 


K 


OPT /a 


By Bernoulli’s inequality, 


f(T) <ka + 


-p+ — i e 

P 7 p 


f(S) 


K) 


y \ OPT/a OPT 

“ 1 ~~ oK ’ 


and the claim follows. 


□ 


4.5. A bound for approximate a 

We would like to fix a as a constant fraction of OPT. For example, taking a = eOPT/2L, 
where e > 0, and plugging into Lemma 20 gives the cleaner bound, 


1 -§) / < r >^ 0PT + 


h±«!. P+ i|E 

P 7 p 


m 


However, the algorithm does not know OPT, and instead we will try to estimate OPT 
approximately. Let us lay out the bound when a is within a factor of 2 of eOPT/2fc. 


Lemma 21. Let e > 0 be a fixed parameter. If e ■ OPT/4A; < a < e ■ OPT/2 k, then 

. v f' p+ ^[ ,( 4 

In particular, for K = 4 k/e 2 , we have 


4 k \ 6 

1-OPT < -OPT + 

eK ) ~ 2 


(1 - e)OPT < 


P 7 p 


f(S) 


4.6. Efficiently estimating a 

Badanidiyuru et al. showed how to “guess” OPT space-efficiently and in a single pass 
[BMKK14]. Let z = argrna f(x). Clearly, OPT > f(z), and by submodularity of /, 

OPT = max f(T) < max^ /(t) < max|T| • f{z) < k ■ f(z). 

teT 

Fix e > 0, and suppose we run a parallel copy of Randomized-Streaming-Greedy for each 
a in 


A(z) = {2 i : i € Z} O 


' e 
Ak 


/(*). 7,f( z ) 
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and at the end of the stream return the best solution among the log(2fc) copies. For some 
a £ A(z), we have 

4r ■ OPT < a < 4- ' OPT, 

4 k ~ ~ 2k 

where we get the approximation guarantee in Lemma 20. 

This strategy requires two passes: one to identify z, and the second running log(2fc) copies 
of Randomized-Streaming-Greedy in parallel. We can reduce the number of passes to 1 by 
updating z and A(z) on the fly. Enumerate the stream ei, e 2 , ■ • •, and for for each i, let 

Zi = arg max f(ej) 
ej-.jeli] 

be the single element maximizing / among the first i elements seen thus far. A(zi) shifts up 
over through the stream as Z{ is updated. At each step i, we maintain parallel solutions 
for each choice of cc £ A(zi), deleting instances with a below A{zf) and instantiating new 
instances with larger values of a. 

To ensure correctness, it suffices to show that when we instantiate an instance of 
Randomized-Streaming-Greedy for a new threshold a, we haven’t skipped over any el¬ 
ements that we would want to include. Let a £ A(zf) — A(zi-i), i.e. /(zj_i) < a < f(zi). 
If /(e j) > a for some j < i, then 


a < f(ej ) < f(zi- 1 ), 


a contradiction. 

4.7. Simpler algorithm and better bound for cardinality constraint 

When the p-matchoid is simply a cardinality constraint with rank k, we can do better. If 
we set p = oo in Randomized-Streaming-Greedy (a, /3) , then the algorithm will only try to 
add to S without exchanging while \S\ < k, effectively halting once we meet the cardinality 
constraint \S\ = k. In Figure 3, we rewrite Randomized-Streaming-Greedy (a, oo) with 
the unnecessary logic removed. 

Lemma 22. If \S\ = k, then f(S) > ka. 

Lemma 23. If \S\ < k, then for any set T C J\f, 

f(SUT) < f(S) + f (T H B) + a\T\. 

Proof. Fix t £ T \ (S U B ), and let Sf be the set held by S when t is processed. Since t is 
rejected, and Sf C S, we have 


fs(t) < fs-tt) < a. 
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Randomized-Streaming-Greedy(a,oo) 

B^d), 

while (stream is not empty) 

e <r- next element in the stream 

if |< k and /s(e) > a then 
B •<— B + e 
if \B\ = K then 

e 4— uniformly random from B 
B <r- B — e, S <r- S + e 
for all e! G B s.t. /s(e') < a 
B <- B — e' 
end if 
end while 
S' <- Offline(B) 
return arg max Z6{s s , } f(Z) 


Figure 3 


Summed over all t G T \ (S U B), we have 

fs(T\B)< J2 m<a\T\- 

teT\B 

Finally, we write 

f(S U T) = fg(T) + f(S) < fg(T \ B) + f~ s {T n B) + f(S) 

<f(S) + f(TnB) + a\T\ 

to attain the desired bound. □ 


Lemma 24. For K = k/e, and a such that (1 — e)OPT < (2 + e)ka < (1 + e)OPT, we 
have 


E 


f(S) 


> -—— • OPT. 
“ 2 + e 


Proof. Let T C J\T be an optimal set with |T| = k and OPT = f(T). 

If |S'! = k, then the claim follows Lemma 22. Otherwise, I5I < k and by Lemma 23, we 
have 


f(SUT)<f(S) + f(TnB) + ka. 


By Lemma 17, we also have 


E 


5UT 


1 


K 


k 


> U--F /( T )> 1-77 /(T)>(l-e)/(T). 


K 
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Finally, by the bound for Offline, we have, 


f(T (1 B) < e E [/ (S')} 

Together, we have 

(1 - e)/(T) < E [/(S) + e/(S')] + ka < (1 + e) E [/(£)] + ka. 

Solving for E f(S) and plugging in f(T) = OPT and ka < Ep|OPT, we have 
r , -i 1 ii 

E /(S) 


> ((1 - e)OPT - ka) > ^ . 2 e) OPT, 


1 + e 


2 + e 


as desired. 


□ 


The preceding analysis reveals that the appropriate choice for a is 0PT/(2 + e)k, where 
OPT = rnax{/(T) : |T| < k} is the maximum value attainable by a set of k elements, and 
that a sufficiently large choice for K is k/e. As in Section 4.6, we can efficiently approximate a 
by guessing a in increasing powers of (1 + e), maintaining at most 0 (log 1 , e k) = 0(e~ 1 log k) 
instances of Randomized-Streaming-Greedy(a,oo) at any instant. The resulting bound is 
stronger than previously derived for a 1 -matchoid. 

Theorem 25. Let f : 2^ —> M>o be a nonnegative submodular function over a ground set J\T, 
and let e > 0 be fixed. Then there exists a streaming algorithm using total space o (^ k ^ 

that, given a stream over N, returns a set S such that |S| < k and f(S) > ■ OPT, where 

OPT = rnax{/(T) : |T| < k} is the maximum value attainable by a set of k elements. 


5. A Deterministic Algorithm via Iterated Greedy 

Gupta et al. gave a framework that takes an offline algorithm for maximizing a monotone 
submodular functions and, by running the algorithm as a black box multiple times over 
different groundsets, produces an algorithm for the nonnegative case [GRST10]. Here we 
adapt the framework to the streaming setting, employing Streaming-Greedy as the blackbox 
for the monotone case. 

We first present Iterated-Streaming-Greedy as an algorithm making two passes over 
A f. In the first pass we run Streaming-Greedy (a, (3) over A f as usual. Let Si denote the 
set output, and U\ the set of all elements added to Si at any intermediate point of the 
algorithm, as per Section 3. In the second pass, we run Streaming-Greedy (0,/3) over the 
set N\U\ of elements that were immediately rejected in the first pass to produce another 
independent set S 2 . Lastly, we run our choice of offline algorithm over Li to produce a third 
independent set S 3 E X. At the end, return the best set among Si, S 2 , and S 3 . 

If we pipeline the two instances of Streaming-Greedy, then Iterated-Streaming-Greedy 
becomes a true streaming algorithm with only one pass over A f. When the first instance 
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Iterated-St reaming-Greedy (a,/3, AO 

// run Streaming-Greedy over Af 

<— Streaming-Greedy (a,/3, AO 
// U\ denotes the set U in Section 3. 

5 2 •<— Streaming-Greedy(0,,3,A/"\ U\) 

5 3 0ffline(/7i) 

return argmax£ e{5l S2iS3} /(S) 


rejects an element e outright, e is sent downstream to the second instance of Streaming- 
Greedy. Note that the running time of Offline depends on the size of its input U\, which 
by Lemma 5 is at most OPT /a. 


Lemma 26. Let T G X be any independent set. Then 


2 • 


(l + l) S 
P 


■ p + — ) E 

Ip 


f(S) >f(T) 


ka 


Furthermore, if Offline is a deterministic algorithm, then Iterated-Streaming-Greedy is 
deterministic and the above holds without taking expectations. 


Proof. By submodularity, we have 


f(Ui U T) + f(U 2 U (T \ U \)) > f{U\ U Su) + /(T \ U 1 ) (2) 


and 


f( T \ Ui) + f(Ui n T) > f(T) + /(0). 

By nonnegative of / and equations (2) and (3), we have, 

f(T) < f(T) + /(0) + f (U\ U S 2 ) < /(C/i U T) + f(U 2 U (T \ ^)) + /(C/i 0 T). 


(3) 


By Corollary 12, we have f(Ui U T) < • p • /(Si) + ka and f(U 2 U (T \ t/i)) < 

• p • /(S 2 ), and f(Ui flT) < ■ /(S 3 ) by assumption. Plugging into the above, and 

□ 


P r J \ J. ■ ■ ' — 

noting that /(Si), /(S 2 ), /(S 3 ) < /(S) gives the bounds we seek. 


Corollary 27. Let e > 0 be given. If a < eOPT /k, then 

/(S)l >(l-e)OPT, 


P 7 P 


and the inequality holds without taking expectations i/ Offline is a deterministic algorithm. 
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The appropriate value of a is guessed efficiently exactly as described in Section 4.6. Here, 
if \Ui\ grows too large in an instance of Iterated-Streaming-Greedy (a,/3) for some fixed 
a, then a must be too small and we can terminate the instance immediately. 

Theorem 28. Let M. p = (A f,T) be a p-matchoid of rank k, let f : 2^ —> R>o be a 
nonnegative submodular function over A I, let e > 0 be fixed. Suppose there exists an offline 
algorithm for finding the largest value independent set in p-m.atchoid with approximation 
ratio 7 p . Then there exists a streaming algorithm using total space O ^ k *° gk ^ that, given a 
stream of M, returns a set S G X such that 


8 p + 


E 


f(S) >(1 


e)OPT. 


V IvJ L J 

If the offline algorithm is deterministic, then the claimed algorithm is deterministic and the 
above bound holds without expectation. 


References 

[Badll] A. Badanidiyuru Varadaraja. Buyback problem: Approximate matroid inter¬ 
section with cancellation costs. In Proc. 38th Intemat. Colloq. Automata Lang. 
Prog. (ICALP), volume 1, pages 379-390, 2011. 

[BFNS12] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. A tight linear time 
(l/2)-approximation for unconstrained submodular maximization. In Proc. 53rd 
Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 649-658, 2012. 

[BFNS14] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz. Submodular maximiza¬ 
tion with cardinality constraints. In Proc. 25th ACM-SIAM Sympos. Discrete 
Algs. (SODA), pages 1433-1452, 2014. 

[BFS15] N. Buchbinder, M. Feldman, and R. Schwartz. Online submodular maximization 
with preemption. In Proc. 26th ACM-SIAM Sympos. Discrete Algs. (SODA), 
pages 1202-1216, 2015. 

[BHZ13] M. Bateni, M. Hajiaghayi, and M. Zadimoghaddam. Submodular secretary 
problem and extensions. ACM Trans. Algs., 9(4):32:l-32:23, October 2013. 

[BIK07] M. Babaioff, N. Immorlica, and R. Kleinberg. Matroids, secretary problems, and 
online mechanisms. In Proc. 18th ACM-SIAM Sympos. Discrete Algs. (SODA), 
pages 434-443, Philadelphia, PA, USA, 2007. 

[BKNS12] N. Bansal, N. Korula, V. Nagarajan, and A. Srinivasan. Solving packing 
integer programs via randomized rounding with alterations. Theo. Comput., 
8(l):533-565, 2012. 


25 





[BMKK14] 

[BV14] 

[CCPV07] 

[CCPV11] 

[CJV15] 

[CK14] 

[CVZ11] 

[CWW10] 

[CWY09] 

[DKR13] 

[DV121 

[FKM+05] 


A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming 
submodular optimization: Massive data summarization on the fly. In Proc. 20th 
ACM Conf. Knowl. Disc, and Data Mining (KDD), pages 671-680, 2014. 

A. Badanidiyuru and J. Vondrak. Fast algorithms for maximizing submodular 
functions. In Proc. 25th ACM-SIAM Sym.pos. Discrete Algs. (SODA), pages 
1497-1514, 2014. 

G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a submodular 
set function subject to a matroid constraint (extended abstract). In Proc. 12th 
Int. Conf. Int. Prog. Comb. Opt. (IPCO), pages 182-196, 2007. 

G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a mono¬ 
tone submodular function subject to a matroid constraint. SIAM J. Comput., 
40(6):1740-1766, 2011. 

C. Chekuri, T.S. Jayram, and J. Vondrak. On multiplicative weight updates for 
concave and submodular function maximization. In Proceedings of ITCS, 2015. 

A. Chakrabarti and S. Kale. Submodular maximization meets streaming: match¬ 
ings, matroids and more. In Proc. 17th Int. Conf. Int. Prog. Comb. Opt. (IPCO), 
pages 210-221, 2014. 

C. Chekuri, J. Vondrak, and R. Zenklusen. Submodular function maximization 
via the multilinear relaxation and contention resolution schemes. In Proc. f3th 
Annu. ACM Sympos. Theory Comput. (STOC), pages 783-792, 2011. 

W. Chen, C. Wang, and Y. Wang. Scalable influence maximization for prevalent 
viral marketing in large-scale social networks. In Proc. 16th ACM Conf. Knowl. 
Disc, and Data Mining (KDD), pages 1029-1038, 2010. 

W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social 
networks. In Proc. 15t.h ACM Conf. Knowl. Disc, and Data Mining (KDD), 
pages 199-208, New York, NY, USA, 2009. 

A. Dasgupta, R. Kumar, and S. Ravi. Summarization through submodularity 
and dispersion. In Proc. 51st Ann. Meet. Assoc, for Comp. Ling. (ACL), 
volume 1, pages 1014-1022, 2013. 

S. Dobzinski and J. Vondrak. From query complexity to computational com¬ 
plexity. In Proc. 44 th Annu. ACM Sympos. Theory Comput. (STOC), pages 
1107-1116, 2012. 

J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph 
problems in a semi-streaming model. Theo. Comp. SO., 348(2-3):207-216, 2005. 


26 



[FNS11] 

[FNSWlll 

[FNW78] 

[Fuj05] 

[FW14] 

[GBL11] 

[GRST10] 

[IJB13] 

[KKT03] 

[KMVV13] 

[KST13] 

[LB11] 

[LKG+07] 


M. Feldman, J. Naor, and R. Schwartz. A unified continuous greedy algorithm 
for submodular maximization. In Proc. 52nd, Annu. IEEE Sym.pos. Found. 
Comput. Sci. (FOCS), pages 570-579, 2011. 

M. Feldman, J. Naor, R. Schwartz, and J. Ward. Improved approximations for 
/c-exchange systems. In Proc. 19th Annu. European Sympos. Algs. (ESA), pages 
784-798, 2011. 

M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations 
for maximizing submodular set functions - II. Math. Prog. Studies, 8:73-87, 
1978. 

S. Fujishige. Submodular functions and optimization , volume 58. Elsevier, 2005. 

Y. Filrnus and J. Ward. Monotone submodular maximization over a matroid 
via non-oblivious local search. SIAM J. Comput., 43(2):514 542, 2014. 

A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. A data-based approach to social 
influence maximization. Proc. VLDB Endow., 5(l):73-84, September 2011. 

A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone 
submodular maximization: Offline and secretary algorithms. In Proc. 6t,h Int. 
Conf. Internet and Network Economics (WINE), pages 246-257, 2010. 

R. Iyer, S. Jegelka, and J. Bilmes. Fast semidifferential-based submodular func¬ 
tion optimization. In Proc. 30th Int. Conf. Mach. Learning (ICML), volume 28, 
pages 855-863, 2013. 

D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence 
through a social network. In Proc. 9t,h ACM Conf. Knowl. Disc, and Data 
Mining (IvDD), pages 137-146, New York, NY, USA, 2003. 

R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms 
in mapreduce and streaming. In Proc. 25th Ann. ACM Sympos. Parallelism Alg. 
Arch. (SPAA), pages 1-10, 2013. 

A. Kulik, H. Shachnai, and T. Tamil - . Approximations for monotone and 
nonmonotone submodular maximization with knapsack constraints. Math. Oper. 
Res., 38(4):729-739, 2013. 

H. Lin and J. Bilmes. A class of submodular functions for document summa¬ 
rization. In Proc. 49th Ann. Meet. Assoc. Comput. Ling.: Human Lang. Tech. 
(HLT), volume 1, pages 510-520, 2011. 

J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. 
Cost-effective outbreak detection in networks. In Proc. 13th ACM Conf. Knowl. 
Disc, and Data Mining (IvDD), pages 420-429, New York, NY, USA, 2007. 


27 



[LMNS10] 

[LSVIO] 

[McG05] 

[NWF78] 

[Sch03] 

[SS13] 

[SSSJ12] 

[Von 13] 


J. Lee, V. S. Mirrokni, V. Nagarajan, and M. Sviridenko. Maximizing nonmono¬ 
tone submodular functions under matroid or knapsack constraints. SIAM J. 
Discrete Math., 23(4):2053-2078, 2010. 

J. Lee, M. Sviridenko, and J. Vondrak. Submodular maximization over multiple 
matroids via generalized exchange properties. Math. Oper. Res., 35:795-806, 
2010 . 

A. McGregor. Finding graph matchings in data streams. In 8 th Inti. Work. 
Approx. Algs. Combin. Opt. Problems, pages 170-181, 2005. 

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations 
for maximizing submodular set functions - I. Math. Prog., 14(1):265 294, 1978. 

A. Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. 
Springer Verlag, 2003. 

L. Seeman and Y. Singer. Adaptive seeding in social networks. In Proc. 5fth 
Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), pages 459-468, 2013. 

R. Sipos, A. Swaminathan, P. Shivaswamy, and T. Joachims. Temporal corpus 
summarization using submodular word coverage. In Proc. 21st ACM Int. Conf. 
Inf. and Know. Management (CIKM), pages 754-763, 2012. 

J. Vondrak. Symmetry and approximability of submodular maximization prob¬ 
lems. SIAM J. Comput., 42(1):265—304, 2013. 


A. Exchange lemmas for matroids 

Lemma 29. Let M = (AT, I) be a matroid, let S,T C J\f be two subsets, and let x, y £ AT 
be two elements. If S spans x, and T + x spans y, then S' U T spans y. 

Proof. It suffices to assume that S and T are independent sets. 

Extend S to a base B in S U T. Since B extends S, B spans x, and B is a base in 
(S U T) + x. Since (S A T) + x spans y, B spans y. □ 

Let Q be a directed graph. For v £ V{G), let N + (v) = {w : (v,w) £ £(G)} denote the set 
of outgoing neighbors of v, and N~(v ) = {u : (u,v) £ £(G)} the set of incoming neighbors 
of v. 

The following lemma is implicit in Badanidiyuru [Badll]. 

Lemma 30. Let A4 = (A f,T) be a matroid, and Q a directed acyclic graph over M such 
that for every non-sink vertex e £ J\f, the outgoing neighbors A + (e) of e span e. Let I £ X 
be an independent set such that no path in G goes from one element in I to another. Then 
there exists an injection from I to sink vertices in Q such that each e £ I maps into an 
element reachable from e. 
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Proof. Restricting our attention to elements reachable from / in G, let us assume that the 
elements of I are sources (i.e., have no incoming neighbors) in Q. Let us call an element 
e G AT an “internal” element if its is neither a sink nor a source in Q . 

We prove by induction on the number of internal vertices reachable from I. In the base 
case, the outgoing neighbors of each i E / are all sinks, and Q is bipartite. For any subset 
JCJ, N + (J) = Uiej-^ + W spans J. J is independent, so we have |J| < |1V + (J)|. Thus, 
by Hall’s matching theorem, there exists an injection I ^ N + (I) such that each i £ I maps 
into N + (i). 

In the general case, let e £ TV \ I be an internal vertex in Q. Consider the graph Pi 
removing e and preserving all paths through e, defined by, 

V(n) = V(G) - e = AT - e, 

£(PL) = (£(<?) \ {(a, b) : a = e or e = a}) U {(a, b ) : (a, e), (e, b ) G £(<?)} 

PL has one less internal vertex than Q, the same sink vertices as G, and a vertex a £ V(%) is 
reachable from i G I in Pi iff it is reachable from * in Q. For any vertex a G Nff (e) that had 
an outgoing arc into e, we have 

!V+(a) = (IV+(a)- e )UlV+( e ), 

which spans a by Lemma 29. Since any other vertices has the same outgoing arcs, we 
conclude that N^(a) spans a for any non-sink vertex of PL. 

By induction, there exists an injection from I into the sinks of PL such that every i 6 / 
is mapped to a sink vertex reachable from i in PL. By construction, these vertices are also 
reachable sinks in G, as claimed. □ 
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